A corpus of general and specific sentences from news
نویسندگان
چکیده
We present a corpus of sentences from news articles that are annotated as general or specific. We employed annotators on Amazon Mechanical Turk to mark sentences from three kinds of news articles—reports on events, finance news and science journalism. We introduce the resulting corpus, with focus on annotator agreement, proportion of general/specific sentences in the articles and results for automatic classification of the two sentence types.
منابع مشابه
General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries
In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as wel...
متن کاملمقایسه روشهای مختلف یادگیری ماشین در خلاصهسازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت
In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...
متن کاملMachine Translation of Sentences with Fixed Expressions
This paper presents a practical machine translation system based on sentence types for economic news stories. Conventional English-to-Japanese machine translation (MT) systems which are rule-based approaches, are difficult to translate certain types of Associated Press (AP) wire service news stories, such as economics and sports, because these topics include many fixed expressions (such as comp...
متن کاملA Comparative Analysis of Institutional Identities in a Corpus of English and Persian News Interviews
Institutional identity as a concept in CDA is a field of study that deals with the identities that individuals in institutions obtain, one that merits deep research attention. News interviews as institutional instances can be analyzed based on the impersonal structures because interviewees see themselves as part of the institution and they may not take responsibility when they encounter problem...
متن کاملBootstrapping Relation Extraction Using Parallel News Articles
Relation extraction is the task of finding entities in text connected by semantic relations. Bootstrapping approaches to relation extraction have gained considerable attention in recent years. These approaches are built with an underlying assumption, that when a pair of words is known to be related in a specific way, sentences containing those words are likely to express that relationship. Ther...
متن کامل